Review

L601 Intro RENDER V2

## Review Your Notes

In the lesson The RL Framework: The Problem, you learned how to take a real-world problem and specify it in the language of reinforcement learning. In order to rigorously define a reinforcement learning task, we generally use a Markov Decision Process (MDP) to model the environment. The MDP specifies the rules that the environment uses to respond to the agent's actions, including how much reward to give to the agent in response to its behavior. The agent's goal is to learn how to play by the rules of the environment, in order to maximize reward.

Next, in the lesson The RL Framework: The Solution, you learned how to specify a solution to the reinforcement learning problem. In particular, the optimal policy \pi_ specifies - for each environment state - how the agent should select an action towards its goal of maximizing reward. You learned that the agent could structure its search for an optimal policy by first estimating the optimal action-value function q_; then, once q_ is known, \pi_ is quickly obtained.

Before continuing with this lesson, please take the time to review your notes, to ensure that the terminology from the previous two lessons is familiar to you. In particular, you should peruse the summary page at the end of the lesson The RL Framework: The Problem, and the page at the end of The RL Framework: The Solution to ensure that the listed concepts are familiar.